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Abstract 

Object tracking is a long standing problem in vision. 
While great efforts have been spent to improve tracking per¬ 
formance, a simple yet reliable prior knowledge is left un¬ 
exploited: the target object in tracking must be an object 
other than non-object. The recently proposed and popular¬ 
ized objectness measure provides a natural way to model 
such prior in visual tracking. Thus motivated, in this paper 
we propose to adapt objectness for visual object tracking. 
Instead of directly applying an existing objectness measure 
that is generic and handles various objects and environ¬ 
ments, we adapt it to be compatible to the specific tracking 
sequence and object. More specifically, we use the newly 
proposed BING [/] objectness as the base, and then train 
an object-adaptive objectness for each tracking task. The 
training is implemented by using an adaptive support vector 
machine that integrates information from the specific track¬ 
ing target into the BING measure. We emphasize that the 
benefit of the proposed adaptive objectness, named AD OB- 
ING, is generic. To show this, we combine ADOBING with 
seven top performed trackers in recent evaluations. We run 
the ADOBING-enhanced trackers with their base trackers 
on two popular benchmarks, the CVPR20I3 benchmark (50 
sequences) and the Princeton Tracking Benchmark (100 se¬ 
quences). On both benchmarks, our methods not only con¬ 
sistently improve the base trackers, but also achieve the best 
known performances. Noting that the way we integrate ob¬ 
jectness in visual tracking is generic and straightforward, 
we expect even more improvement by using tracker-specific 
objectness. 

1. Introduction 

Visual object tracking is a fundamental computer vision 
task with a wide line of applications for human-computer 
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Figure 1. Improving tracking by integrating objectness. The re¬ 
sults on the right frames show that when the base tracker (in green. 
Struck [14]) starts drifting, the proposed objectness-aware solution 
(in red, Struck-i-ADOBING) successfully avoids such pitfalls. 

interaction, surveillance, vehicle navigation, etc. Various 
factors, including illumination changes, partial occlusions, 
pose variations and background clutter, challenge tracking 
algorithms in practice. To handle these factors, a great 
amount of efforts have been devoted to develop robust ob¬ 
servation model by utilizing the local structure of the target 
and/or visual cues such as shape and appearance. 

Despite a large amount of previous efforts, little attention 
has been given to a simple yet reliable prior that a visual tar¬ 
get under tracking should first be an object rather than not. 
An obvious advantage of integrating such information is to 
inhibit drifting, as observed in our experiments (e.g., Fig. 1). 
This intuition naturally directs our attention to the recently 
proposed and popularized objectness measure [2] that esti¬ 
mates the likelihood that a given image window contains a 
whole object. 

To apply the objectness for visual object tracking, how¬ 
ever, there are two issues need to be addressed. The first is 
speed, for which we luckily have the newly developed fast 
objectness algorithm named BING [7]. The second one is 
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adaptivity, since objectness is originally designed to han¬ 
dle generic objects under various environment; while for 
tracking, we typically focus on a specific object in a relative 
stable environment. 

Guided by the above idea, in this paper we propose to 
integrate objectness for object tracking. First, we derive a 
novel adaptive objectness based on BING, named ADOB- 
ING, that adapts BING to a specific tracking task. In par¬ 
ticular, given an image sequence and the initial object-of- 
interest, ADOBING is learnt by an adaptive SVM that ad¬ 
justs BING according to the tracking object and background 
in the initial frame. This way, the generic objectness is 
effectively balanced with a specific tracking task at hand, 
meanwhile the extreme computational efficiency of BING 
is inherited. 

We then integrate ADOBING to existing visual track¬ 
ers to show the general advantage of using objectness for 
tracking. Towards the goal, instead of designing a spe¬ 
cific mechanism to improve a specific tracker, we employ 
a straightforward strategy, i.e., linear combination of the 
original tracking confidence and ADOBING, to improve 
seven trackers (called base trackers) that have achieved 
top performances in recent tracking benchmarks [37, 25, 
20]. We test the objectness enhanced trackers on two 
recently proposed tracking benchmarks: the CVPR2013 
tracking benchmark [37] and the Princeton Tracking Bench¬ 
mark [27]. The results show not only the consistent im¬ 
provement over the base trackers by using objectiveness, 
but also the advantage of adapting the original objectness 
to a tracking specific one. In addition, with the help of ob¬ 
jectness, we have achieved the best results ever reported on 
these benchmarks. 

In summary, our contributions are three-fold: (1) in¬ 
tegrating objectness for visual tracking, (2) developing a 
tracking-adaptive objectness, and (3) thorough experimen¬ 
tal validation with state-of-the-art performance. 

In the rest of the paper, we first summarize related work 
in Sec. 2. Then, we introduce the proposed adaptive ob¬ 
jectness and its integration in tracking in Sec. 3. The ex¬ 
perimental validation is described in Sec. 4, followed by 
conclusion in Sec. 5. 

2. Related work 

Visual Object Tracking. Visual tracking has been stud¬ 
ied for several decades and it is beyond this paper to 
give a comprehensive review. Surveys of tracking algo¬ 
rithms can be found in [39], or in tracking evaluation pa¬ 
pers [37, 25, 26, 19] for more recent progresses. In the fol¬ 
lowing we review some most related works. 

While image-based objectness is new for visual track¬ 
ing, a related concept, namely visual saliency, has been 
recently connected to visual tracking in [23, 22, 30]. In 
particular, a biologically inspired object tracking algorithm 


was proposed in [23], which selects the most informative 
features by utilizing the connection between discriminant 
saliency and the Bayes error for target/background classi¬ 
fication. The relationship between tracking reliability and 
the degree target saliency was studied in [22] by human be¬ 
havior experiments. In [30], to deal with abrupt motion, the 
target is relocated by searching the salient region obtained 
from an adaptive saliency map when the target gets lost. 
It is also worth noting that the work in [29] uses a motion 
saliency mechanism which considers the specific motion of 
the target was developed to re-discover the tracked object. 
Despite related to our work, none of the above studies takes 
the prior into account that a tracking target needs to be an 
object. As far as we know, our work is the first to explicitly 
model such prior for visual object tracking. 

Objectness. The concept of objectness is first proposed 
by Alexe et al. [2] to reflects the likelihood that an image 
window contains an entire object. The objectness estima¬ 
tor is trained using various image cues, such as multi-scale 
saliency, color contrast, edge density and superpixel strad¬ 
dling, to model regions that stand out from the surroundings 
and have a closed boundary. The objectness thus learned 
is generic and can be applied to many vision tasks for im¬ 
proving accuracy or speed, or both. As a result, it has been 
recently applied to a series of tasks such as object detec¬ 
tion [5], image retargeting [31], action localization [16], 
salient region segmentation [17], scene classification [18], 
image retrieval [32], etc. 

Ideas similar to objectness has also been developed for 
vision tasks. For example, selective search based on hier¬ 
archical grouping with diversified criteria was proposed in 
[34] to generate high quality object proposals. In [41], cas¬ 
caded ranking SVMs were used for object proposal genera¬ 
tion for object detection. In [8], regions from segmentation 
can be further ranked with structure learning to produce ob¬ 
ject proposals. 

Being effective for many vision tasks, the computation 
cost of the original objectness forbids it from tasks that re¬ 
quest high speed responses such as tracking. Addressing 
this problem, Cheng et al. [ ] broke through the computa¬ 
tional bottleneck by proposing a very fast objectness mea¬ 
sure named BING (a short review is given in Sec. 3.2.1). 
The extreme high efficiency (300 fps on a laptop) opens a 
way for using BING in many real-time applications and we 
adopt BING as the basis in our study. The objectness pro¬ 
posed in this paper builds on top of BING by automatically 
adapting it for specific visual tracking tasks. This way, our 
new objectness measure enjoys both fast objectness infer¬ 
ence and tracking-oriented accuracy improvement. 

Model transfer/adaptation for tracking. Transfer learn¬ 
ing deals with tasks where the target task can benefit from 
adapting knowledge or models learned from training sam- 
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Figure 2. Framework of integrating adaptive-objectness for object tracking. 


pies that have a distribution correlated with yet different 
from the distribution of the testing samples [24]. In our 
study, we apply transfer learning to SVM-based classifier 
which is used in our base objectness model (i.e., BING). In 
this aspect, our work is closely related to and inspired by 
the work in [38, 33]. In particular, to increase the amount 
of transfer without penalizing the margin, projective model 
transfer SVM was proposed in [3], and the regularization 
term was further extended for deformable source template. 
The adapt-SVM [38] modifies the regularization term for 
the model parameters of SVM to transfer knowledge from 
a single already learned source model. In [33], in order to 
transfer knowledge from multiple source models and allevi¬ 
ate negative transfer, an adaptive least-square support vector 
machine was proposed which could weight the source mod¬ 
els differently. 

Transfer learning has been applied to visual tracking in 
several previous studies [11, 21, 36, 35]. In [21], a semi- 
supervised online boosting algorithm based on ’’Covariate 
shift“ is proposed for tracking. The approaches in [36, 35] 
use prior knowledge learned offline from real-world natural 
images to represent image patches. In [11], prior knowl¬ 
edge learned by online regression on auxiliary examples is 
transferred to assist the final decision. Our work is different 
in two main aspects: (1) we apply transfer learning to ob¬ 
jectness instead of directly on tracking; and (2) we use £i 
regularized adaptive SVM which is different than in previ¬ 
ous studies. 

3. Approach 
3.1. Overview 

Our proposed framework to enhance visual tracking by 
adaptive objectness is summarized in Figure 2. In the fol¬ 


lowing we give a brief description, and postpone more de¬ 
tails in other subsections. 

Given an input image sequence and tracking initializa¬ 
tion (e.g., bounding box for object-of-interest), our frame¬ 
work starts by first learning the adaptive objectness, i.e., 
ADOBING, using an adaptive SVM. The learning takes two 
components as input: the tracking-specific training samples 
V extracted from the tracking initialization, and the base 
BING objectness algorithm represented by its parameter 
vector w from [7]. 

Let T be the base tracker to be enhanced, /t(') be the 
tracking confidence of T for tracking candidates and /o(') 
be the learned ADOBING objectness. Then, during track¬ 
ing, for each tracking candidate c, we fuse its base track¬ 
ing inference /t(c) and its adaptive objectness /o(c) in a 
weighted linear combination. The tracking result is then se¬ 
lected according the fused confidence. 

3.2. Object-adaptive Objectness 
3.2.1 Review of BING 

In [7], a 64D binarized normed gradients (BING) feature 
was proposed for efficient objectness estimation. Motivated 
by the fact that objects are stand-alone things with well- 
defined closed boundaries and centers [2, 10], BING first 
resizes image windows to a small fixed size (8 x 8 is cho¬ 
sen for the computational reason), then uses the correspond¬ 
ing normed gradients to discriminate objects and non-object 
stuff in an image. 

In the training stage, it trains a linear model w G 
with linear SVM. In the testing stage, the model w is 
approximated with binary vectors and their com¬ 
plements weighted by (3j. The 64D normed gradi¬ 
ents (each element is saved as a BYTE value) is approx- 















































(a) Tracking input (b) BING (c) ADORING 

Figure 3. Objectness (BING) and adaptive objectness (ADOB- 
ING) for a specific tracking task, (a) The first frame with the 
initialization of the tracking target (red bounding box), (b) the 
objectness map of BING, (c) The objectness map of ADOBING. 

imated by Ng binarized normed gradients (BING) feature 
SiS g = where hk G {0,and is the bi¬ 

nary approximation of g at the kth bit. Then, the confidence 
score of an image window can be efficiently estimated us¬ 
ing: 

Ng 

s ~ ^ l^j ^ Cj^k 

j=i k=i 

where Cj^k = 2^“^(2(a^, b/.) — |b/c|). Since the dimen¬ 
sion of and hk is 64, they can be stored with int64, 
and Cj^k can be tested using fast bitwise and popcnt SSE 
operators. 

3.2.2 Learning Adaptive Objectness 

As briefed in Sec. 3.1, we can formulate the learning of 
adaptive objectness as following: Given the training data 
'D = {xi, Hi} and a previously learned linear model w G 
{i.e., BING), where G is a normed gradients 
vector of an image patch and yi = ±1 is its binary label, 
our task is to learn a linear model w adapted from w. 

Objective function. To train the linear model w, we use 
the adaptive SVM framework [38] so that the discrepancy 
between w and w can be constrained while minimizing 
the classification error over V. Specifically, the regular- 
izer ||w||i in standard -regularized linear SVM [40] is re¬ 
placed by ||w — w||i, and resulting the following objective 
function: 

N 

min ||w-will +(7 V(max(0, 1 - , (1) 

W ^^ 

where C is the regularization weight. 

Solving (1) with Coordinate Descent. We employ the 
coordinate descent algorithm with one-dimensional New¬ 
ton direction to solve the optimization in (1). The idea is 
to iteratively improve w, and, in each iteration, improv¬ 
ing w sequentially along each dimension. In the following, 
we use for w at the beginning of the k-\h iteration. 


Algorithm 1 Coordinate Descent [40] 

Input: G 

I: for k=l,2,..., iterate until convergence do 
2: 

3: for j = 1, 2, • • • , n do 

4: Find 2 ; by solving the sub-problem (2) exactly or 

approximately. 

5 ; + + ZBj. 

6: end for 

7: 

8 : end for 
9: return 


and for after updating along its j-th dimension 

(1 < j < 64). 

Following [9], for the i-th training sample we 

define bi{w) = 1 — ^^w^x^ and /(w) = {i|6i(w) > 0}. 
Then, a coordinate descent for updating the j-th compo¬ 
nent of is achieved by solving the following one¬ 
dimensional sub-problem: 

mm^_^-(2;) - Wj + 2;| - - w_^-| 

where subscript j indicates the j-th element of the associ- 
ated vector, and Lj{z-, u) = bi{u + zejf, 

and Oj G is the vector with the j-th element be 1 and 
all others be 0. The coordinate descent framework is sum¬ 
marized in Algorithm 1 . 

Note that gj{z) is not differentiable, to solve the (2), 
we calculate the Newton direction by considering only the 
second-order approximation of the loss term Lj{z] 
and solve 

min — Wj -h 2 ;| — — Wj\ 

' 1 (3) 

+ L'(0; + -L"(0; , 

where 

= -2C ^ (4) 

and 

L"(0,w('=’J))=2C ^ xf. . (5) 

Note that L'' (0, in the above formula is a general¬ 
ized second derivative [6], since Lj{z; is not twice 

differentiable. 

With some derivation similar in [40], it can be shown that 
















(3) has the following closed-form solution: 

if > 1 

Otherwise. 


d = < 


Ly(0;w(fc>i)) 
L'J (0;w(^’-?)) 

(fe.i) 


( 6 ) 


Wj - 


where Sj(v) = i'(0; v) - L"(0; v)(vj - Wj). 

We then conduct a line search procedure to check if /3*d 
satisfy the following sufficient decrease condition: 

9jW*d)-gj{0) < cr^‘(L'(0;w('=’-^))d 

X (7) 


where /3 G (0,1), t = 0,1, 2, • • •, and a G (0,1). The first 
that satisfies the condition (7) is chosen as the solution 
for the sub-problem (2). 

3.3. Encoding Adaptive Objectness for Tracking 

As illustrated in Fig. 2, in addition to the learning algo¬ 
rithm, three components are needed for encoding objectness 
for tracking, including preparing training samples, selecting 
a base tracker, and fusing the base tracker with the proposed 
adaptive objectness. 


Generating training samples V. For each sequence, we 
use the first frame to generate training samples V with a 
sliding window fashion over the entire image. One image 
patch is labeled as positive if its overlap with the ground 
truth is greater than some predefined threshold; otherwise, it 
is labeled as negative. Figure 3 presents the confidence map 
of the original BING and the learned adaptive objectness 
(ADOBING) for a specific tracking task. 

In this study we limit the samples to the first frames 
mainly for two reasons. First, theoretically, only the ini¬ 
tialization is guaranteed to be the true target and tracking 
results from the second frame can be polluted. Second, 
though we aim to adjust the original objectness for the spe¬ 
cific tracking sequence, we want to avoid overfitting the 
objectness. In other words, using limited number of sam¬ 
ples balances the generic property and the tracking speci¬ 
ficity of the proposed adaptive objectness. That been said, 
in practice, one may collect more samples from several ini¬ 
tial frames for improvement. 


since BING and our ADOBING are both implemented in 
Such selection provides us six base trackers, namely 
BSBT [28], Frag [1], MIL [4], OAB [12], SemiT [13], and 
Struck [14]. 

Furthermore, there are several recently proposed trackers 
with higher reported performance on the above mentioned 
benchmarks, such as [1 1, 15]. From such trackers, we select 
TGPR [11] since it has C/C-\-\- implementation available. 

In summary, we have seven state-of-the-art trackers se¬ 
lected as base trackers. As will be clear in Sec. 4, though 
they have already achieved top performances in previ¬ 
ous benchmark evaluations, their performances can still be 
boosted by integrating the proposed adaptive objectness and 
produce best results known so far. 

Encoding objectness. Given a base tracker denoted by T, 
one may improve it by integrating objectness in a tracker- 
specific way so as to maximize the benefit from objectness. 
However, in this paper, we are more interested in showing 
that the benefit provided by objectness is generic. There¬ 
fore, we follow a straightforward strategy to directly com¬ 
bine the tracking confidence from T with the objectness 
measure. This strategy is very general and applicable to all 
selected base trackers as well as most other modem track¬ 
ers. 

Roughly speaking, for a base tracker T, when a new 
frame arrives, to identify the tracking target, a set of can¬ 
didate C = {c^} is first constructed; then a tracking confi¬ 
dence /t(') applies to each candidate; finally the candidate 
with maximum confidence value is selected as the tracking 
result. To integrate objectness (either BING or ADOBING), 
we simply replace /t by an objectness-enhanced confi¬ 
dence foT, and foT is a weighted sum of /t and fo, where 
/o (•) is the objectness measure of a candidate. In particular, 
for a candidate Ci G C, we have 

forici) = frici) + A/o(ci) , (8) 

where A is a constant weight. 

The above strategy has been applied to all seven base 
trackers. For each base tracker, we normalize the original 
confidence (probability, cost, etc.) for the strategy. It is 
emphasizing again that, despite its simplicity, the strategy 
boosts consistently all base trackers in our experiment when 
using the proposed adaptive objectness (ADOBING). 

4. Experiments 


Selecting base trackers. It is impractical to use all ex¬ 
isting tracking algorithms to validate the efficacy of inte¬ 
grating objectness, instead we select top ranked trackers in 
recent tracking evaluations [37, 25, 20]. More specifically, 
we first create an initial set of trackers that ranked within top 
10 in any of these evaluations. Then, from these trackers, 
we chose those with open source implemented in C/C-\-+, 


We evaluate the proposed objectness-enhanced track¬ 
ing algorithms on two recently published benchmarks: the 
CVPR2013 benchmark [37] and the Princeton Tracking 
Benchmark [27]. For the parameter setting of the base 
trackers, we set them as the default. For adaptive SVM, we 
set C = 0.01 in Eq.(l); for combining the confidence from 
a base tracker with the objectness measure, we set A = 0.1 
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Figure 4. Success and precision plots for all ADOBING-enhanced trackers and base trackers on the CVPR2013 benchmark. 


Table 1. The performance of BING- or ADOBING-enhanced trackers and their corresponding base trackers in term of AUC on the 
CVPR2013 benchmark. The best and second best for each tracker are indicated by red and blue respectively. 



BSBT [28] 

Frag[l] 

MIL [4] 

OAB [12] 

SemiT [13] 

Struck [14] 

TGPR [11] 

Average 

Base 

0.327 

0.350 

0.355 

0.352 

0.372 

0.482 

0.530 

0.395 

Base+BING 

0.320 

0.374 

0.358 

0.360 

0.382 

0.496 

0.545 

0.405 

Base+ADOB 

0.335 

0.375 

0.375 

0.379 

0.395 

0.511 

0.547 

0.417 


in Eq.(8). These parameter settings are throughout all the 
experiments. 

In the experiment, in addition to test each base tracker 
along with its ADOBING-enhanced version, we also run 
a BING-enhanced version that uses the original BING ob- 
jectness. In the following, for a base tracker “T”, we use 
“T-fADOB” and “T-fBING” to denote the two objectness- 
enhanced version of “T”. 

4.1. CVPR2013 Visual Tracking Benchmark 

The CVPR2013 Visual Tracking Benchmark [37] in¬ 
cludes 50 fully annotated sequences. To further understand 
the strength and weakness of tracking algorithms, these se¬ 
quences are categorized according to 11 challenging factors 
containing illumination variation (IV), scale variation (SV), 
occlusion (OCC), deformation (DEF), motion blur (MB), 
fast motion (EM), in-plane rotation (IPR), out-of-plane ro¬ 
tation (OPR), out-of-view (OV), background clutter (BC), 
and low resolution (LR). 

We follow the protocol in [37] for evaluation. One metric 
is the center location error (CEE), defined as the Euclidean 
distance between the center of the tracked target position 
and the manually labeled ground truth. The average CEE 
over all frames can be used to measure the performance for 
that sequence. However, such measurement is not mean¬ 
ingful when the tracker loses the target completely. Pre¬ 
cision plot addresses this issue by showing the percentage 
of frames whose CLEs are within a given threshold. As 
in [37], the precision score at the threshold = 20 pixels is 
used to rank the trackers in our evaluation. 

Another widely used metric is based on the bounding 


box overlap. For each frame, given the tracking output 
bounding box (r^) and the ground truth bounding box (r^), 
the overlap score S = is used to measure tracking 

success, where | • | denotes the area. To quantize the track¬ 
ing performance of a tracker on a sequence of frames, we 
calculate the percentage of frames whose overlap score is 
larger than a given threshold. The success plot can then be 
generated by varying the threshold from 0 to 1. The Area 
Under Curve (AUC) derived from the success plot is used 
to rank the trackers. Comparing with the precision obtained 
at the threshold 20, AUC measures the overall performance 
and is therefore more accurate, so we mainly use AUC in 
our analysis. 

Results. Figure 4 shows the success and the precision 
plots of the seven ADOBING-enhanced trackers and their 
corresponding base trackers. In both plots, we can see that 
the ADOBING-enhanced trackers are consistently better 
than their corresponding base ones. Table 1 gives the quan¬ 
titative comparison of the base trackers and the two versions 
of objectness-enhanced trackers. The results show that the 
proposed adaptive objectness (i.e., ADOBING) brings more 
benefits than BING for all base trackers. Figure 5 shows ex¬ 
ample tracking results where ADOBING helps tracking. 

It is worth noting that the ADOBING-enhanced TGPR 
(precision score for ranking: 0.7678) outperforms previ¬ 
ously reported best ones - 0.732 by [15] and 0.733 by [11]. 

It is unrealistic to expect that tracking can be entirely 
solved by integrating objectness. That said, it is useful to 
investigate the failures to better understand the proposed 
trackers. Figure 6 shows some typical failures observed 























Table 2. The performance (AUC) gain under different challenging factors. The attributes are ordered according to the gain. 



OV 

MB 

FM 

IPR 

DEF 

occ 

OPR 

IV 

BC 

sv 

LR 

Base 

0.370 

0.351 

0.362 

0.381 

0.364 

0.336 

0.369 

0.371 

0.380 

0.348 

0.301 

Base+ADOB 

0.405 

0.376 

0.387 

0.403 

0.386 

0.358 

0.390 

0.390 

0.397 

0.361 

0.312 

Gain 

0.035 

0.025 

0.025 

0.022 

0.022 

0.022 

0.021 

0.019 

0.017 

0.013 

0.011 



(a) BSBT 


(b)Frag 



(c) MIL 


(d) OAB 



(g) tgpr 

Figure 5. Examples where adaptive objectness (ADOBING) helps tracking. The results of the base tracker are shown in green, and the 
results using ADOBING in red. The name of the base tracker is shown under the results. 


in our experiments. Roughly speaking, these failures are 
mainly due to challenges that bother most existing trackers. 

Attribute-based performance analysis. Taking advan¬ 
tage of the attribute annotation of each sequence, we ana¬ 
lyze the performance of the object-adaptive objectness for 
visual tracking under different challenging factors. Table 2 
summarizes the performance gain of using ADOBING. The 
AUC for base trackers under each attribute is calculated by 
averaging the AUC of all the base trackers on the corre¬ 
sponding subset of sequences; the AUC for the ADOBING- 
enhanced trackers is obtained in a similar way. 

From the results, we can see that ADOBING helps vi¬ 
sual tracking algorithms consistently under all the challenge 
factors. The largest performance gain is achieved for out-of- 
view (OV). A possible reason for this is that when an object 
is moving out of view, it often generates partial objects to 
incorrectly update the base tracker; by contrast, ADOBING 
helps inhibiting such partial objects since they usually have 
low objectness. On the other end, the gain for low resolu¬ 
tion (LR) is relatively small, which can be attributed to the 


lack of reliable guidance from the base trackers due to the 
weak appearance information. 

4.2. Princeton Tracking Benchmark 

The recently proposed Princeton Tracking Benchmark 
(PTB) [27] contains 100 RGBD sequences divided into 11 
categories. Among these 100 sequences, the ground truth of 
5 of them are released for parameter tuning and the rest are 
withheld for evaluation. Though the purpose of this dataset 
is to evaluate RGBD trackers, the available RGB sequences 
and the evaluation website^ make it suitable to verify the 
usefulness of the objectness for visual tracking. Note that 
due to limitation of current depth acquisition techniques, all 
the sequences are captured indoors. We follow the protocol 
in [27] and submit the results to the PTB website for eval¬ 
uation. The success rate is calculated by thresholding the 
overlap between the tracked bounding box of the target and 
the ground truth. 

Table 3 summarizes the overall success rate and the suc- 

^ http://vision.princeton.edu/projects/2013/tracking/ 






































(c) Failure due to occlusion(tracking a face) 


(d) Failure due to low resolution (tracking a head) 


BSBT — Frag — MIL — OAB — SemiT — Strcuk — TGPR 
Figure 6. Some failures observed in our experiment involving different challenge factors. The legend shows the corresponding base trackers. 


Table 3. Evaluation results on the Princeton Tracking Benchmark. 


Algorithm 

overall 

SR 

target type 

target size 

movement 

occlusion 

motion type 

human 

animal 

rigid 

large 

small 

slow 

fast 

yes 

no 

passive 

active 

TGPR-fADOB 

0.489 

0.39 

0.51 

0.59 

0.47 

0.50 

0.63 

0.43 

0.37 

0.65 

0.55 

0.46 

TGPR 

0.472 

0.36 

0.51 

0.58 

0.46 

0.48 

0.62 

0.41 

0.35 

0.65 

0.56 

0.44 

Struck-i-ADOB 

0.447 

0.38 

0.47 

0.52 

0.44 

0.45 

0.58 

0.40 

0.30 

0.65 

0.52 

0.42 

Struck 

0.444 

0.35 

0.47 

0.53 

0.45 

0.44 

0.58 

0.39 

0.30 

0.64 

0.54 

0.41 

Frag-i-ADOB 

0.429 

0.38 

0.43 

0.49 

0.45 

0.41 

0.62 

0.35 

0.35 

0.54 

0.49 

0.41 

Frag 

0.412 

0.39 

0.41 

0.44 

0.46 

0.37 

0.58 

0.35 

0.33 

0.52 

0.46 

0.39 

OAB-hADOB 

0.405 

0.32 

0.44 

0.49 

0.38 

0.43 

0.53 

0.35 

0.29 

0.57 

0.51 

0.37 

MIL-hADOB 

0.403 

0.32 

0.51 

0.44 

0.39 

0.41 

0.54 

0.35 

0.29 

0.56 

0.48 

0.37 

SemiT-i-ADOB 

0.385 

0.34 

0.42 

0.42 

0.37 

0.40 

0.52 

0.33 

0.30 

0.51 

0.49 

0.34 

OAB 

0.382 

0.28 

0.46 

0.46 

0.35 

0.41 

0.52 

0.32 

0.27 

0.54 

0.48 

0.35 

MIL 

0.355 

0.32 

0.37 

0.38 

0.37 

0.35 

0.46 

0.31 

0.26 

0.49 

0.40 

0.34 

BSBT-hADOB 

0.296 

0.25 

0.26 

0.37 

0.31 

0.29 

0.43 

0.24 

0.26 

0.35 

0.43 

0.25 

BSBT 

0.285 

0.22 

0.30 

0.36 

0.27 

0.29 

0.42 

0.23 

0.26 

0.32 

0.43 

0.23 

SemiT 

0.283 

0.22 

0.33 

0.33 

0.24 

0.32 

0.38 

0.24 

0.25 

0.33 

0.42 

0.23 


Table 4. The performance of BING- or ADOBING- enhanced trackers and their corresponding base trackers in term of success rate on 
PTB. The best and second best for each tracker are indicated by red and blue respectively. 



BSBT [28] 

Frag [1] 

MIL [4] 

OAB [12] 

SemiT [13] 

Struck [14] 

TGPR[ ] 

Average 

Base 

0.285 

0.412 

0.355 

0.382 

0.283 

0.444 

0.472 

0.376 

Base-fBING 

0.305 

0.434 

0.408 

0.388 

0.367 

0.434 

0.505 

0.406 

Base-fADOB 

0.296 

0.429 

0.403 

0.405 

0.385 

0.447 

0.489 

0.408 


cess rate under each category of the evaluated tracker. For 
the base trackers MIL, SemiT, Struck which have already 
been evaluated in [27], we use the most recent results from 
the evaluation website directly. From the results, we can 
see that all seven base trackers can benefit from integrating 
the proposed adaptive objectness, an observation consistent 
with our experiments on the CVPR2013 benchmark. 

Table 4 lists the comparison of the base tracker with 
the two versions of objectness-enhanced versions. On one 
hand, it again confirms the consistent improvement using 
the proposed ADOBING objectness; on the other hand, it 
shows that the improvement using ADOBING is similar to 
that using the original BING. That said, BING is less stable 


since it actually hurts the performance when using Struck 
as the base tracker. 

5. Conclusion 

In this paper, we propose to use adaptive objectness for 
assisting object tracking. Based on the recently proposed 
fast objectness algorithm named BING, we have designed 
a tracking-adaptive objectness named ADOBING through 
adaptive SVM. ADOBING effectively adjust the general 
objectness estimation for taking into consideration track¬ 
ing specific information. Consequently, when integrated 
into a base tracker, it can help reduce the chance of drift- 








































ing by avoiding tracking candidates that do not appear like 
an object. To validate the idea, ADORING is integrated into 
seven highly ranked trackers chosen from recent published 
evaluations. Then these trackers are tested on two public 
benchmarks including in total 150 sequences. The results 
show that integration of ADORING, even in a straightfor¬ 
ward way, consistently improves these trackers. 
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